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Abstract. This article presents several alternatives to Pearson's correlation coefficient 
and many examples. In the samples where the rank in a discrete variable counts more 
than the variable values, the mixture of Pearson’s and Spearman's gives a better result. 


Introduction 
Let's consider a bivariate sample, which consists of n 7 2 pairs (x,y). We denote these 
pairs by: 
(xi, yi). (Хо, y2), 0.9 (Xn,Yn), 
where x; = the value of x for the i-th observation, 
and y; = the value of y for the é-th observation, 
for any 1 <é<n. 
We can construct a scatter plot in order to detect any relationship between variables x and 
y, drawing a horizontal x-axis and a vertical y-axis, and plotting points of coordinates 


(хг, уг) for all i € (1, 2, ..., n}. 


We use the standard statistics notations, mostly used in regression analysis: 


Yo-Yoe УУУ», Улу У), 
i=l i=l i=l 

У-У Уу-уу, (1) 
1-1 i=l 


X = = = the mean of sample variable х, 
n 

I 220 

Y = =— =the mean of sample variable y. 
n 


Let's introduce a notation for the median: 


Xy 7 the median of sample variable x, (2) 
Y, = the median of sample variable у. 
Correlation Coefficients. 


Correlation coefficient of variables x and y shows how strongly the values of these 
variables are related to one another. It is denoted by r and re [-1, 1]. 


If the correlation coefficient is positive, then both variables are simultaneously increasing 
(or simultaneously decreasing). 


If the correlation coefficient is negative, then when one variable increases while the other 
decreases, and reciprocally. 


Therefore, the correlation coefficient measures the degree of line association between two 
variables. 


We have strong relationship if re [0.8, 1] orre [-1, -0.8]; 
moderate relationship ifre (0.5, 0.8) orre (-0.8, -0.5); (3) 
And weak relationship ifre [-0.5, 0.5]. 





Correlation coefficient does not depend on the measurement unit, neither on the order of 
variables: (x, y) or (y, x). 


If r= 1 or -1, then there is a perfectly linear relationship between x and y. If r= 0, or 





The coefficient of determination, denoted by r^, represents the proportion of variation in y 
due to a linear relationship between x and y in the sample: 


M SSTo —SS Resid _ 1 SSRe sid 








(4) 
SSTo SSTo 
where SSTo = total sum of squares = Yo 3j i cy (5) 
i=l 
and SSResid = residual sum of squares = У y-3?- У yi— ĵi) (6) 


1-1 
with yi= the i-th predicted value = a + bx; fori e {1,2,...,n} 


resulting from substituting each sample x value into the equation for the least-squares line 


y=atbx 


Dy - 0390 у) п] 
Ух^2-[0У х) ^2/и] 





where b = 


(7) 


anda- Y -bX. (8) 
Obviously: coefficient of determination = (correlation coefficient)’. 


Two sample correlation coefficients are well-known: 


1) Pearson’s sample correlation coefficient, let’s denote it by rp 


- Y, - I3 у)/и] 
tp (9) 
J3x^2-I12^2/n]4 Y »^2-I0 ^ 2/n] 


which is the most popular; 








and 2) Spearman’s rank correlation coefficient, let's denote it by rs, which is obtained 
from the previous one by replacing, for each ie (1, 2, ..., n}, xi by its rank in the variable 
x, and similarly for y;. 





We propose more alternative sample correlation coefficients in the following 
ways, replacing in Pearson's formula (9): 


3.1. Each xi by its deviation from the x mean: xi - x , 
and each у; by its deviation from the y mean: yz у. 


3.2. Each xi by its deviation from the x minimum: Xi-Xmin, and each y; by its deviation 
from the y minimum: yi-Ymin. 


3.3. Each xi by its deviation from the x maximum: Xmax — X; and each у; by its deviation 
from the y maximum: ymax-yi 


3.4. Each x; by its deviation from a given xy (for ke (1,2, ..., nj): 
Xj-Xk 


and each yi by its deviation from the corresponding given ув: 
У:-Ук 


Not surprisingly, all these four alternative sample correlation coefficients are equal to 
Pearson's since they are simply related to translations of Cartesian axes, whose origin 
(0,0) is moved to (x, У), (хь, Ymin), (Xmax, Ymax), OF (хк, ук) respectively. 


Example: Let the variables x, y be given below: 


| d 7 m na o al 8| 60 


e| nl 





| 25 aal 63| 24| 29| 153| 207| 18.4 


Table 1 


and their scatter plot: 
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Graph 1 
1) Calculating Pearson's correlation coefficient: 
Y, = 357; х= 35.7; 
Уу = 124.3; у = 12.43; 
2 _ : 
$x" = 18,989; 


> y? = 2,634.11; 


Уху = 6,916.8; 


rp = 0.95075. 


2) Calculating Spearman’s rank correlation coefficient: 






































Table 2 





Ух- — 115- 55; 
bL 

Ух? 385, 

> vy? = 385; 

У ху = 377; 


rs = 0.90303. 


3.1) Replacing x; by xi — x and yi by yi—y for all ¿ (deviations from the mean): 





x 93] og] 2327] 2328] -127| Sal 323 [243] 33.3| 36.3] 
y | -993|-1133| -6.13] -10.33] -953| 2.87] s27| 5.97 9.57| 20.57| 
Table 3 


Similarly: 23 -0, 
10 
because 255 = > (0-3) =X, -X +Xo-—xX t...tXjp-X = (хх +... + хо) -10Х 
1-1 


"m o rhet 





Уу =0; 
Ух? = 6,244.10; 
Уу? = 1,089.06; 


У xy = 2,479.29; 


mean = 0.95075. 


3.2) Replacing xi, yi by their deviations from the smaller x: = X-Xsmay and y: = y-ysmai 
we have a translation of axes again. 





x | o 1 d s vi oss 41 54 e| 66 
y | 14 of so al 18| 142] 19.6 173| 209| 319 
Table 4 


Ух = 297; 

Уу = 113.3; 
Ух? = 15,065; 
Уу? = 2,372.75; 
Уху = 5,844.30; 
(small) = 0.95075. 


3.3) Replacing х, у; by their deviations from the maximum: 





x | 66 65, во] ss 49 3| 19 12) 3 d 
y | 305] 31.9] 267] 309] 304| 177| 123] 146 ul 9 
Table 5 


> x = 363; 

Уу = 205.7; 

> x? = 19,421; 

> y? = 5,320.31; 

У xy = 9,946.20; 
Tmax) = 0.95075. 


3.4) Replacing x; by х; – x4 and у; by у; – ya (in this case k = 4), (x4, уд) = (14, 2.1): 





x d ed - 2] o 9 27 39 46 558 58] 
y | o4 al 42| of osl 13.2] 186| 163| 19.9] 30.9] 


Table 6 


> x? = 10,953; 
dy? = 2,156.15; 
У ху = 4,720.9; 


r4 = 1: = 0.95075 for any ie (1,2, ..., 10}. 


Similarly if we replace in Pearson’s formula (9) and also getting the same result equals to 


fp: 


3.6) 


3.7) 


3.8) 


Each x; by its deviation from x's median, and each y; by its deviation from y's 
median. 


Each x; by its deviation from x's standard deviation, and each y; by its deviation 
from y's standard deviation. 


Each х; by xi + a (where a is any number), and each у; by у: + b (where b is any 
number). 

Each x; by x; * a (where a is any non-zero number and “*” is either division or 
multiplication), and each y; by y; * b (similarly for b and '**"). 


Since the cases 3.5 — 3.7 are similar to 3.1 - 3.4, let's consider two examples for the case 


3.8: 


3.8.1) Suppose each x; in the original example, Table 1, is divided by 5, while each yi is 


Then: 


divided by 2. 


2x = 71.4; 


Уу = 62.15; 
Ух? = 759.56; 
Уу? = 658.528; 
Уху = 691.68; 


T (division, division) = 0.95075. 


3.8.2) Now, let's still divide each x; in Table 1 by 5, but this time multiply each у; with 


2. 


Then: be = 71.4; 
Y,» = 248.6; 


x^ = 759.56; 
У y? = 10,5364; 
У ху = 2,766.72; 


T(division, multiplication) — 0.95075. 
So, again these results coincide with Pearson's. 


More interesting alternative correlation coefficients [and given different results from 
Pearson's and Spearman's] are obtained by doing: 


A mixture of Pearson's and Spearman's correlation coefficients. 


4.1 We only replace x; by its rank among x's, while y; remains unchanged: 





xrank| (1 2] 3 4 s 58 7 s 9 10 
y | 2353 11 63) 2a) 29] 1s3| 207| 184| 22) 33 
Table 7 

Ух 755; 

Y,» = 124.3; 

> x? = 385; 

dy’ = 2,634.11; 

Уху = 958.4; 


Tsp = 0.91661 є [0.90303, 0.95075]. 





4.2. Similarly, as above, let's only replace y; by its rank among y's, while x; remains 
unchanged. 

x | sd 7 121 ia 23 al 53 eo e| 72 

уак (3 (Ju s a3 d 6 3 7 9 10 

Table 8 


Ух =357; 
595 


$ x° = 18,989; 
Y» = 385; 
У ху = 2,636; 


rp, = 0.93698 e [0.90303, 0.95075]. 


Both mixture correlation coefficients give different results from Pearson's and 
Spearman's, actually they are in between. 


Conclusion: 
In the samples where the rank in a discrete variable counts more than the variable values, 
this mixture of correlation coefficients brings better results than Pearson's or Spearman's. 
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